probability simplex
Discrete Flow Maps
Potaptchik, Peter, Yim, Jason, Saravanan, Adhi, Holderrieth, Peter, Vanden-Eijnden, Eric, Albergo, Michael S.
The sequential nature of autoregressive next-token prediction imposes a fundamental speed limit on large language models. While continuous flow models offer a path to parallel generation, they traditionally demand expensive iterative integration. Flow Maps bypass this bottleneck by compressing generative trajectories into single-step mappings, theoretically enabling the generation of full text sequences from noise in a single forward pass. However, standard formulations rely on Euclidean regression losses that are geometrically ill-suited for discrete data. In this work, we resolve this conflict with Discrete Flow Maps, a framework that reconciles trajectory compression with the geometry of the probability simplex. We recast standard flow map training for the discrete domain, aligning the training dynamics with the discrete nature of language. Empirically, this strict geometric alignment allows our method to surpass previous state-of-the-art results in discrete flow modeling.
Estimating Staged Event Tree Models via Hierarchical Clustering on the Simplex
Shoaib, Muhammad, Riccomagno, Eva, Leonelli, Manuele, Varando, Gherardo
Staged tree models enhance Bayesian networks by incorporating context-specific dependencies through a stage-based structure. In this study, we present a new framework for estimating staged trees using hierarchical clustering on the probability simplex, utilizing simplex basesd divergences. We conduct a thorough evaluation of several distance and divergence metrics including Total Variation, Hellinger, Fisher, and Kaniadakis; alongside various linkage methods such as Ward.D2, average, complete, and McQuitty. We conducted the simulation experiments that reveals Total Variation, especially when combined with Ward.D2 linkage, consistently produces staged trees with better model fit, structure recovery, and computational efficiency. We assess performance by utilizing relative Bayesian Information Criterion (BIC), and Hamming distance. Our findings indicate that although Backward Hill Climbing (BHC) delivers competitive outcomes, it incurs a significantly higher computational cost. On the other, Total Variation divergence with Ward.D2 linkage, achieves similar performance while providing significantly better computational efficiency, making it a more viable option for large-scale or time sensitive tasks.
Simplex-Optimized Hybrid Ensemble for Large Language Model Text Detection Under Generative Distribution Drif
Kristanto, Sepyan Purnama, Hakim, Lutfi, Yusuf, Dianni
Abstract--The widespread adoption of large language models (LLMs) has made it difficult to distinguish human writing from machine-produced text in many real applications. Detectors that were effective for one generation of models tend to degrade when newer models or modified decoding strategies are introduced. In this work, we study this lack of stability and propose a hybrid ensemble that is explicitly designed to cope with changing generator distributions. The ensemble combines three complementary components: a RoBERT a-based classifier fine-tuned for supervised detection, a curvature-inspired score based on perturbing the input and measuring changes in model likelihood, and a compact stylometric model built on handcrafted linguistic features. The outputs of these components are fused on the probability simplex, and the weights are chosen via validation-based search. We frame this approach in terms of variance reduction and risk under mixtures of generators, and show that the simplex constraint provides a simple way to trade off the strengths and weaknesses of each branch. Experiments on a 30 000-document corpus drawn from several LLM families including models unseen during training and paraphrased attack variants show that the proposed method achieves 94.2% accuracy and an AUC of 0.978. The ensemble also lowers false positives on scientific articles compared to strong baselines, which is critical in educational and research settings where wrongly flagging human work is costly. Text generated by large language models (LLMs) is now routinely used in homework, reports, programming, and informal communication.
Large-Scale Stochastic Sampling from the Probability Simplex
Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space the time-discretization error can dominate when we are near the boundary of the space. We demonstrate that because of this, current SGMCMC methods for the simplex struggle with sparse simplex spaces; when many of the components are close to zero. Unfortunately, many popular large-scale Bayesian models, such as network or topic models, require inference on sparse simplex spaces. To avoid the biases caused by this discretization error, we propose the stochastic Cox-Ingersoll-Ross process (SCIR), which removes all discretization error and we prove that samples from the SCIR process are asymptotically unbiased. We discuss how this idea can be extended to target other constrained spaces. Use of the SCIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.
Temporal Robustness in Discrete Time Linear Dynamical Systems
Metya, Nilava, Shah, Ankit, Sinha, Arunesh
Discrete time linear dynamical systems, including Markov chains, have found many applications including in security settings such as in cybersecurity operations center (CSOC) management and in managing health risks. However, in these two scenarios, there is uncertainty about the time horizon for which the system runs. This creates uncertainty about the cost (or reward) incurred based on the state distribution when the system stops. Given past data samples of how long a system ran, we theoretically analyze the cost incurred at the stop of the system as a distributional robust cost estimation task in a Wasserstein ambiguity set. Towards this, we show an equivalence between a discrete time Markov Chain on a probability simplex and a global asymptotic stable (GAS) discrete time linear dynamical system, allowing us to base our study on a GAS system only. Then, we provide various polynomial time algorithms and hardness results for different cases in our theoretical study, including a novel proof of a fundamental result about Wassertein distance based polytope. We experiment with real world data in CSOC domain and prior data in health domain to reveal the benefits of our model and approach.
309928d4b100a5d75adff48a9bfc1ddb-Reviews.html
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper extends the stochastic gradient langevin dynamics by using the Reimannian structure and applies it into probability simplex. The idea appears to be quite interesting. But there are several confusing parts that I don't quite get. Maybe the authors can elaborate those a bit.